Resources for Multilingual Text Generation in Three Slavic Languages
نویسندگان
چکیده
The paper discusses the methods followed to re-use a large-scale, broad-coverage English grammar for constructing similar scale grammars for Bulgarian, Czech and Russian for the fast prototyping of a multilingual generation system. We present (1) the theoretical and methodological basis for resource sharing across languages, (2) the use of a corpus-based contrastive register analysis, in particular, contrastive analysis of mood and agency. Because the study concerns reuse of the grammar of a language that is typologically quite different from the languages treated, the issues addressed in this paper appear relevant to a wider range of researchers in need of largescale grammars for less-researched languages.
منابع مشابه
Multilinguality in a Text Generation System For Three Slavic Languages
This paper describes a multilingual text generation system in the domain of CAD/CAM software instructions for Bulgarian, Czech and Russian. Starting from a language-independent semantic representation, the system drafts natural, continuous text as typically found in software manuals. The core modules for strategic and tactical generation are implemented using the KPML platform for linguistic re...
متن کاملThe MULTEXT-East Morphosyntactic Specifications for Slavic Languages
Word-level morphosyntactic descriptions, such as “Ncmsn” designating a common masculine singular noun in the nominative, have been developed for all Slavic languages, yet there have been few attempts to arrive at a proposal that would be harmonised across the languages. Standardisation adds to the interchange potential of the resources, making it easier to develop multilingual applications or t...
متن کاملThe MULTEXT-East Morphosyntactic Specification for Slavic Languages
Word-level morphosyntactic descriptions, such as “Ncmsn” designating a common masculine singular noun in the nominative, have been developed for all Slavic languages, yet there have been few attempts to arrive at a proposal that would be harmonised across the languages. Standardisation adds to the interchange potential of the resources, making it easier to develop multilingual applications or t...
متن کاملHandling Word Order in a Multilingual System for Generation of Instructions
Slavic languages are characteristic by their relatively high degree of word order freedom. In the process of automatic generation from an underlying representation of the content, we have to ensure that a semantically and contextually appropriate word order is chosen. In this paper, we elucidate information structure as the main factor determining word order in Slavic languages, and we present ...
متن کاملMultilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources
This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the ‘...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000